BMC Genomic Data — Latest Matching Preprints

1

Expression-dependent but strand-independent synonymous single-nucleotide polymorphism in the Escherichia coli chromosome

Deka, N.; Beura, P. K.; Sen, P.; Aziz, R.; Kashyap, A.; Keot, D.; Jain, M.; Namsa, N. D.; Deka, R. C.; Feil, E.; Satapathy, S. S.; Ray, S. K.

2026-05-26 evolutionary biology 10.64898/2026.05.22.727198 medRxiv

Top 0.1%

1.9%

Show abstract

BackgroundMutation is thought to arise mainly during replication, though transcription is also known to be mutagenic. Considering the recent reports regarding genome-wide transcription-induced mutagenesis, a distinct demonstration of specific mutation being replication-dependent and/or transcription-dependent in genomes is yet to be established. Here, we studied synonymous single-nucleotide polymorphisms (SNPs) in 2091 individual coding sequences (CDS) in the leading strand (LeS) and the lagging strand (LaS) of the Escherichia coli chromosome by comparing across 157 strains. The frequencies of complementary transitions (ti) and complementary transversions (tv) were compared in each CDS to assess parity violation in the strands. ResultsThe C[->]T and G[->]A exhibited the maximum frequency as well as the most prominent strand inequality as these tis were influenced both by the strands as well as by the expression. Interestingly, inequality between T[->]C and A[->]G was expression-dependent but strand-independent. A[->]T and G[->]T tvs were universally more frequent than their complementary T[->]A and C[->]A tvs, respectively. ConclusionsOur study demonstrates strand-independent but expression-dependent synonymous SNP inequality in CDS, supporting the role of transcription-induced mutagenesis contributing to strand inequality in the E. coli chromosome.

2

The role of long-range transcriptional regulation in interpretation of non-coding variants associated with human disease

Mandic, K.; Hrsak, D.; Uljanic, F.; Lenhard, B.; Baresic, A.

2026-06-17 genomics 10.64898/2026.06.15.731051 medRxiv

Top 0.1%

1.3%

Show abstract

Genome-wide association studies (GWAS) are the key tools for the discovery of associations between single nucleotide polymorphisms (SNPs) and phenotypic traits and have been successfully applied to many diseases and disorders. However, a great challenge is to find the gene affected by the non-coding fraction of SNPs, especially if the gene is distal in terms of genomic distance. In this study, we present a novel approach, named targPred, which utilises genomic regulatory blocks (GRBs) for inference of a connection between a certain SNP/locus and the target gene located in the same GRB, in a more robust and generalisable manner. We identified that many disease traits such as cancer and psychiatric disease have a propensity for long-range regulation. Furthermore, we showcased a childhood obesity locus which is connected to the distal BDNF gene. Finally, we propose a new web-based service based on enhancer-promoter association, to facilitate finding the causal genes for a wide array of traits and conditions.

3

MYC and RNA Polymerase II Binding Near Transcriptional End Sites Regulate the Expression of Functionally-Related Genes

Prochownik, E. V.; Henchy, C. M.; Wang, H.

2026-06-26 bioinformatics 10.64898/2026.06.22.733817 medRxiv

Top 0.1%

1.1%

Show abstract

MYC oncoprotein binding at promoters and enhancers influences RNA polymerase II (RNAPII)-driven gene expression. Numerous genes also bind MYC near their transcriptional end sites (TESs). This often allows direct promoter-TES contact via looping and further regulates total and 'read-through' transcription that extends beyond standard termination sites. We aimed here to better clarify the rules governing TES associated MYC and/or RNAPII binding cross-talk in human and murine cells. Using ChIPseq and RNAseq datasets from the ENCODE portal and elsewhere, MYC and RNAPII binding profiles were found to differ around TESs and transcriptional start sites (TSSs). Variations in E box flanking sequences likely accounted for the somewhat lower affinities of MYC for TES-associated sites. Motifs for numerous other transcription factors were also observed to cluster non-randomly and in close proximity to MYC and RNAPII binding site peak summits. On average, genes with TES-proximal MYC or RNAPII sites were more highly expressed than those without, although co-binding tended to be suppressive. Both normal and neoplastic proliferative stimuli altered the MYC and RNAPII binding patterns of many genes, indicating that 'category switching' was common, subject to disparate external signals and often reversible. Functionally related gene sets with high levels of read-through transcription were uniformly marked by significant amounts of TES-associated MYC and/or RNAPII binding. These findings indicate that, both independently and together, MYC and RNAPII binding near TESs dynamically impact total and read-through transcription while also coordinating the expression of many common purpose gene sets.

4

Increased chromatin accessibility following 1α,25-dihydroxyvitamin D3 treatment in human endometrial stromal cells

Yi, M.; Bostan, H.; DeMayo, F. J.

2026-05-09 molecular biology 10.64898/2026.05.06.723064 medRxiv

Top 0.1%

0.9%

Show abstract

Vitamin D signaling has recognized roles in female reproductive physiology, but its effects at the chromatin level in endometrial stromal cells are still unclear. Here, we investigated how the active form of vitamin D, 1,25-dihydroxyvitamin D3, or calcitriol, influences the accessible chromatin landscape of human endometrial stromal cells. Assay for transposase-accessible chromatin using sequencing (ATAC-seq) was performed on T-HESCs treated with either a vehicle or 1,25(OH)2D3. Ligand treatment increased overall chromatin accessibility, shown by higher ATAC-seq signal intensity, while causing only minor changes in the total number of called peaks. Peak annotation revealed that accessible regions were spread across both promoter-proximal and distal genomic areas. Integrating this data with CUT&RUN and RNA sequencing showed that most vitamin D-responsive cistromic modifications and transcripts were linked to nearby open chromatin, though fewer were associated with regions that were significantly differentially accessible. These results suggest that 1,25(OH)2D3-dependent transcription mainly occurs within a permissive, pre-accessible chromatin environment. This study offers new evidence that active vitamin D influences the epigenomic landscape of human endometrial stromal cells, establishing the chromatin-based molecular response to a chemically-defined VDR ligand, 1,25(OH)2D3, relevant to stromal differentiation and preparation for decidualization. HighlightsO_LIFirst evidence suggesting the direct impact of active vitamin D, 1,25-dihydroxyvitamin D3, 1,25(OH)2D3, enhanced the signal intensity of chromatin accessibility in human endometrial stromal cells C_LIO_LIMost accessible chromatin regions were shared between vehicle and ligand-treated human endometrial stromal cells C_LIO_LI1,25(OH)2D3-responsive transcription occurs largely within pre-accessible chromatin in human endometrial stromal cells C_LIO_LIAssay for transposase-accessible chromatin sequencing (ATAC-seq) defines a chromatin-level pharmacologic response to a chemically defined VDR ligand in human endometrial stromal cells C_LI

5

Bipartite DNA binding domain of transcription factor BCL11B binds clustered short DNA sequence motifs

Lee, J.; Zhou, J.; Horton, J. R.; Yu, M.; Muoghalu, M. D.; Khan, F. A.; Zhang, X.; Huang, Y.; Blumenthal, R. M.; Zhang, X.; Cheng, X.

2026-05-02 biochemistry 10.64898/2026.05.01.721897 medRxiv

Top 0.2%

0.8%

Show abstract

B-cell leukemia/lymphoma 11B (BCL11B), despite its name, is a key regulator of T-cell development, specification, and T-cell malignancies. BCL11B contains a bipartite DNA binding domain composed of two C2H2 zinc finger arrays: low-affinity ZF2-3 and high affinity ZF4-6. These arrays function as homotypic modules that recognize similar six-nucleotide motifs, TG(O_SCPLOWNC_SCPLOW)CC(O_SCPLOWCC_SCPLOWO_SCPCAP/C_SCPCAPO_SCPLOWTC_SCPLOWO_SCPCAP/C_SCPCAPO_SCPLOWAC_SCPLOW), as seven of the eight DNA base-contacting residues are conserved between them. The most conserved interactions involve GG dinucleotides, contacted by arginine and lysine residues at key base-interacting positions in ZF3 and ZF5. The two ZF arrays are connected by a long [~]300-residue linker that provides flexibility in how the arrays engage DNA, allowing ZF2-3 and ZF4-6 binding to the same or opposite strands with variable orientation, spacing and positioning along the DNA. This extended linker is enriched in serine/threonine, acidic residues (aspartate/glutamate), and structural residues (glycine/proline), providing additional layers of transcriptional regulation possibly through post-translational modification, electrostatic modulation, and/or condensate formation. We also examined six missense mutations in base-interacting residues, that are associated with neurodevelopmental disorders. Substitutions replacing bulky, positively charged arginine or lysine with smaller or hydrophobic residues likely reduce DNA-binding affinity and/or specificity, whereas substitutions between asparagine and lysine may alter base recognition preferences.

6

The Effect of Vaccination on the Evolution of the SARS-CoV-2 B.1.351 Variant

Wang, Z.; Raeihle, M.; Braun-Gorman, S.; Leung, I.; Richards, C.; Gabbay, L.; Shamoon-Pour, M.

2026-05-08 molecular biology 10.64898/2026.05.06.723356 medRxiv

Top 0.2%

0.6%

Show abstract

Since the initial distribution of the SARS-CoV-19 vaccine, its widespread use has been hypothesized to act as a selective pressure that drives the COVID-19 virus to mutate. This study aims to investigate the correlation between global vaccination rates and the mutation rate of the SARS-CoV-2 Beta variant (B.1.351). From January to July 2021, nucleotide diversity increased in tandem with vaccination rates, demonstrating that the virus evolved more rapidly in response to selective pressure from mass vaccination. Statistical analysis revealed statistically significant positive correlations between both vaccination rates and vaccine doses administered with nucleotide diversity. Thus, our findings indicate a positive correlation between rising vaccination rates and nucleotide diversity, suggesting that increased vaccination coverage acted as a selective pressure that accelerated viral evolution of SARS CoV2.

7

Integrated epidemiology and toxicology reveals the protective effects of TMAO against chemical neurotoxicity in children

de Leeuw, V. C.; Maitre, L.; van Oostrom, C. T.; Renard-Dausset, E.; Anguita, A.; Chatzi, L.; Coen, M.; Grazuleviciene, R.; Heude, B.; Ibarluzea, J.; Julvez, J.; Keun, H. C.; Piersma, A. H.; Maria, L. S.; Marquez, S.; Ruiz-Rivera, M.; Subiza-Perez, M.; Brantsaeter, A. L.; Toledano, M. B.; Vrijheid, M.; Wright, J.; Hessel, E. V.; Hoyles, L.; McArthur, S.

2026-07-06 epidemiology 10.64898/2026.07.02.26357012 medRxiv

Top 0.2%

0.6%

Show abstract

Interest in microbiota-host co-metabolism and the effects of its derived co-metabolites on biological processes is increasing rapidly. In addition to their demonstrated associations with mammalian metabolic health and cognition, microbiota-host co-metabolites (MHCMs) represent lifelong contributors to the endogenous exposome. We have previously shown the MHCM trimethylamine N-oxide (TMAO) to exert beneficial effects on murine blood-brain barrier integrity and cognition. Here we investigated whether these positive neural effects of TMAO extended to humans, analysing how TMAO exposure associates with neurodevelopmental outcomes in children and whether an in vitro human neuronal-astrocyte co-culture could contribute to further investigation of the underlying mechanism(s) and neuronal processes related to these associations. In a cohort study of childhood mental health (N=1,203), TMAO was associated with fewer internalising problems, while its precursor microbial metabolite trimethylamine was associated with more behavioural problems in both the cross-sectional and an independent longitudinal study from 1 to 15 years of age (N=630-820). Given prior associations between TMAO exposure and exposure to the environmental pollutants mercury and arsenic, we investigated how the effects of TMAO interacted with these known neurotoxicants. TMAO had a protective effect, modifying the relationship between arsenic exposure and poorer neurodevelopmental outcomes. Furthermore, TMAO activated synaptogenesis-related gene expression and was functionally protective against the negative effects of mercury in our in vitro model. Together, our findings emphasise the importance of interdisciplinary approaches to evaluate associations and potential pathways of MHCMs (endogenous) and environmental (exogenous) metabolites on neurodevelopment in exposome studies.

8

Dentine markers of pre/early postnatal lead exposure links with brain, cognitive, and behavioral outcomes in adolescents

Marshall, A. T.; Kan, E.; Adise, S.; König, M.; McConnell, R.; Martinez, M.; Midya, V.; Arora, M.; Sowell, E. R.

2026-05-27 pediatrics 10.64898/2026.05.26.26354134 medRxiv

Top 0.2%

0.6%

Show abstract

Lead is a toxic metal ubiquitous in our environment. While dramatic reductions in lead sources have paralleled equivalent decreases in lead-poisoning rates, chronic lead exposure remains a critical public health concern. Childhood lead exposure (at its lowest levels) is liked to changes in cognitive development but less is known about lead's effects on children's brain structure, especially as a result of in utero exposure. We measured prenatal and early-postnatal lead exposure in shed deciduous teeth of 448 9- and 10-year-old children (from 20 United States cities) and linked those lead levels to childhood brain structure, cognition/behavior, and neighborhood- and family-level socioeconomic characteristics. Here we show negative associations between tooth-lead levels and the thickness of the brain's cortex, particularly in regions linked to language processing. With increasing tooth-lead levels, children of lower-income (versus higher-income) families showed steeper declines in receptive vocabulary. Caregiver-reported behavioral problems exhibited similar associations. With in utero exposure linked to adverse neurodevelopmental outcomes (well before lead exposure and its risks are evaluated by healthcare professionals), prenatal screening of maternal lead levels/exposure, coupled with recommended strategies to reduce its placental transmission, may help reduce lead's effects on future generations.

9

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

Froukh, T.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353895 medRxiv

Top 0.2%

0.6%

Show abstract

Currently, the genetic architecture of Middle Eastern populations is underrepresented in global genomic databases. This gap increases the rate of Variants of Uncertain Significance (VUSs) and clinical misinterpretations of genomic data especially in Middle Eastern populations. Whole exome sequencing was conducted on 90 healthy individuals from Jordan and the data were analysed using Principal Component Analysis (PCA) and multi-computational filtering. PCA revealed a double ancestry (EUR-AFR) admixture rather than a triple admixture (EUR-AFR-AMR). More than 3,500 populations-specific variants (PSVs) were identified, of which 72% were singletons. Additionally, 19 variants were significantly enriched compared to the maximum allele frequencies in public global databases (Fisher's exact test with Benjamini-Hochberg false discovery rate correction, p-value < 0.05). Consequently, the results suggest the reclassification of variants of Uncertain Significance (VUS) which reside in the ECE2 gene to likely benign and the variants of Conflicting Classification of Pathogenicity in the genes IL1RN and THPO to benign based on the significant allele frequency (AF=0.0389, p-value < 0.05). Furthermore, a pathogenic ClinVar variant was identified in a healthy individual, warranting careful interpretation. The findings underscore the importance of identifying PSVs in order to minimize or even prevent clinical misdiagnosis and highlight the unique genetic signature in Jordan. The study serves as a foundational resource for precision medicine in the region.

10

The Genetic Landscape and Epidemiological Characteristics of Inherited Retinal Diseases in the Chinese Population

Zeng, B.; Cui, Z.; Zhou, S.; Dai, W.

2026-05-29 ophthalmology 10.64898/2026.05.27.26354224 medRxiv

Top 0.2%

0.6%

Show abstract

Background: Inherited Retinal Diseases (IRDs) are a group of genetically heterogeneous blinding conditions. Major global genomic reference databases are disproportionately enriched for individuals of European ancestry. This underrepresentation creates a significant bias that impedes the accuracy of genetic diagnosis in the Chinese population. This study aims to address this limitation by constructing a comprehensive genetic landscape of IRDs using large-scale deep-sequencing data from a large Chinese cohort. Methods: The study leveraged variant data primarily from 10,588 individuals in the China Metabolic Analytics Project (ChinaMAP) and cross-referenced findings against multiple national and international databases. We systematically curated variants within a targeted panel of 291 IRD-associated genes. Variant pathogenicity was assessed using a comprehensive pipeline integrating InterVar-automated classification based on 2015 American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines, ClinVar evidence (review status [≥] 1 star), and manual literature curation. We delineated the mutational spectrum, identified population-enriched pathogenic/likely pathogenic (P/LP) variants, and analyzed the distribution characteristics of IRD-associated highly-mutated genes. Furthermore, we calculated the carrier frequencies (CF) and genetic prevalence (GP) of autosomal recessive(AR)-IRD genes in the Chinese population. Results: The study revealed a highly concentrated genetic landscape for AR-IRDs in the Chinese population, with ABCA4 and USH2A emerging as the primary drivers of the genetic burden. This finding aligns with previous Chinese cohorts but contrasts with global databases, where genes such as the X-linked RPGR are more prevalent. In contrast, autosomal dominant (AD)-IRDs exhibited high locus heterogeneity, with pathogenic variants dispersed across numerous genes (e.g., COL2A1 and MFN2). We identified a series of P/LP variants that were either high-frequency or significantly enriched in the Chinese population, such as CNGB1 (p.P530R) and specific recurrent alleles in ABCA4 and CYP4V2. The estimated cumulative CF for AR-IRDs was 1 in 5.60, and the theoretical total GP was 1 in 2,624.67, based on the ChinaMAP data. Conclusion: By integrating the ChinaMAP dataset with diverse genomic resources, this study provides a genetic landscape of IRDs in the Chinese population. Our analysis shows a concentrated mutational spectrum in AR-IRDs, contrasting with the pronounced heterogeneity in AD-IRDs. These findings, including population-specific pathogenic variants and refined prevalence estimates, provide a resource for precision diagnostics, genetic counseling, expanded carrier screening (ECS), and public health policy development in China.

11

Deciphering the limitations of immortalized hepatocyte cell lines for the study of liver cis-regulatory elements

Bellesis, A.; Li, X.; Moore-Frederick, D.; Xu, D.; Delbridge, K.; Ma, J.; Vaccaro, G.; Edward, B. A. A.; Kellogg, M.; Creeger, Y.; Okamoto, A. S.; Kaplow, I. M.

2026-06-09 genomics 10.64898/2026.06.05.730479 medRxiv

Top 0.3%

0.6%

Show abstract

Immortalized cell lines are widely used in biological research despite their known differences from their tissues and cell types of origin. Such cell lines are especially popular for testing hypotheses regarding the activity of cis-regulatory elements (CREs) that regulate gene expression. Previous investigations of blood and skin cell lines revealed many differences between the transcriptional regulatory networks of the cell lines and the associated primary cells. Similar comparisons for other tissues have been limited. Here, we used ATAC-seq to profile CREs in four immortalized liver cell lines and found many differences between each cell lines CREs and primary liver tissue, including differences in the transcription factors that are likely to bind them and differences in the genes that they are likely to regulate. Modifying cell culture conditions based on recommendations in the literature did not improve the similarity with primary liver tissue. Our results suggest that differences between the transcriptional regulatory networks in cell lines and primary tissue should be considered when designing and interpreting cell line experiments.

12

An Integrated Computational Approach to Predict and Characterize Emerging Mutations in the Japanese Encephalitis Virus Envelope Protein

Thippeswamy, H.; Suresh, D. K. P.; Pandey, R. K.; Sekar, Y. S.; Ramesh, V.; Kamble, N.; Palavesam, A.; Patil, S. S.; Hirematha, J.

2026-05-26 bioinformatics 10.64898/2026.05.26.727781 medRxiv

Top 0.3%

0.6%

Show abstract

Japanese encephalitis virus (JEV) causes significant encephalitis across the Asia-Pacific region. Current vaccines target historical genotype III strains, but emerging genotypes,potentially driven by vaccine-mediated selective pressure, threaten vaccine effectiveness through altered envelope protein sequences that may reduce antibody cross-neutralisation. This study employed integrated sequence and structural analyses to identify E protein mutations affecting neutralising antibody binding and protein stability. The study curated JEV polyprotein sequences from NCBI, performed multiple sequence alignment, and used Shannon entropy to pinpoint highly variable positions. Mutations occurring at [≥]1% frequency within high-entropy regions were selected for analysis. From 34 initially identified mutations, four candidates were prioritized based on structural stabilization potential. Mutations were evaluated through FoldX stability predictions, molecular docking with antibody 2H4 using HADDOCK3, and molecular dynamics simulations. Binding energies were calculated using MM-GBSA analysis. Results demonstrated that all mutant E-2H4 complexes remained stable during simulations, with root-mean-square deviation plateauing after equilibration and minimal localized changes in root-mean-square fluctuation. These findings suggest that EDIII substitutions represent important candidates for further investigation to understand genotype-specific variations and inform next-generation vaccine development strategies against emerging JEV strains.

13

Novel Drosophila cis-regulatory elements can be uncovered by footprinting transcription factor binding sites in ATAC-seq data

Mei, C.; Ness, J.; Nakai, K.; Wunderlich, Z.

2026-06-25 genomics 10.64898/2026.06.22.733832 medRxiv

Top 0.3%

0.6%

Show abstract

Developmental processes depend on carefully coordinated gene expression. Expression is modulated by the binding of transcription factors (TFs) to cis-regulatory elements (CREs), like enhancers and promoters. Many computational and experimental approaches have been developed to find CREs, particularly enhancers, in the genome, each with strengths and caveats. Given the increasing availability of ATAC-seq data and methods to find TF binding therein, we hypothesized that we could use TF footprinting tools to find clusters of TF binding events within accessible chromatin that may act as CREs. Using Drosophila anterior-posterior patterning network as a test bed, we used a digital genomic footprinting tool (DGT), TOBIAS, on previously published early embryo ATAC-seq data to characterize the TF footprint landscape of 16 TFs essential for embryonic patterning. Even in this system, with its extensive enhancer annotation, most footprinted TF binding sites lie outside of known enhancers, with intergenic and intronic regions hosting the highest TF footprint count, albeit at low density. To find potential novel enhancers, we identified high-density TF footprint clusters that are highly conserved and overlap with active enhancer histone mark signals. Five high confidence candidates were selected for reporter assay validation and all five were found to drive spatially patterned expression in the embryo. This study shows that even in a highly characterized system, the analysis of footprinted TF binding sites in ATAC-seq data can uncover new regulatory regions and suggests this approach may be helpful in using existing ATAC-seq data to find novel CREs. ARTICLE SUMMARYGiven the increasing availability of ATAC-seq datasets, workflows to exploit the data to uncover new cis-regulatory elements (CREs), including enhancers, are valuable. Using early anterior-posterior patterning in the Drosophila embryo as a test case, we find that previously published transcription factor footprinting tools and ATAC-seq data can be analyzed to yield new candidate CREs. Experimental validation confirms the activity of selected candidate CREs, suggesting that existing data can be analyzed to find novel regulatory elements.

14

A Curated Genome-Scale Nucleotide Diversity Panel of Non-Human Primates

Pankratov, V.; Meyer Pedersen, B.; Fogh Sorensen, E.; Munch, K.; Bataillon, T.; Schierup, M. H.; Bergman, J.

2026-06-17 genomics 10.64898/2026.06.16.732573 medRxiv

Top 0.4%

0.5%

Show abstract

BackgroundPrimates constitute one of the most phylogenetically and ecologically diverse Eutherian mammalian orders, with a central role in advancing our knowledge of human evolution, speciation processes and conservation biology. While thousands of whole-genome sequences have been generated across a multitude of primate taxa, discrepancies in data processing - particularly the lack of ploidy-aware variant calling in sex-linked regions - have limited the utility of existing datasets for large-scale comparative analyses. ResultsHere, we utilized publicly available short-read sequencing data of non-human primates, recently published primate genome assemblies and a ploidy-aware variant calling procedure to generate a genome-scale nucleotide diversity panel comprising 3,240 individuals from 269 species and 71 genera. To further facilitate cross-species comparisons, we generated a multiple-genome alignment of primate assemblies used for variant calling. ConclusionThis curated resource of non-human primate diversity provides a foundation for future research in primate evolutionary biology, speciation, and sex chromosome evolution (https://pure.au.dk/portal/en/datasets/primate-diversity-panel/).

15

Genome-wide meQTL mapping in cattle blood reveals cis and trans regulation of DNA methylation

Fouere, C.; Costes, V.; Besnard, F.; Le Danvic, C.; Patry, C.; Fritz, S.; Boussaha, M.; Jouin, M.; Boichard, D.; Kiefer, H.; Costa Monteiro Moreira, G.; Sanchez, M.-P.

2026-07-08 genetics 10.64898/2026.07.07.736355 medRxiv

Top 0.4%

0.4%

Show abstract

Background Complex traits are influenced by numerous variants, most of which have regulatory effects on gene expression that can be mediated by DNA methylation. Molecular QTL mapping is an approach that aims to dissect these effects. However, obtaining molecular phenotypes on a large scale is challenging, particularly in livestock species. In cattle, an epigenotyping array called EpiChip has recently been developed in the European RUMIGEN project. The EpiChip, which contains 43,317 CpG sites distributed all over the bovine genome, enables large-scale measurement of DNA methylation. This study aims to characterize the genetic determinism of blood DNA methylation in cows by estimating heritability and mapping cis- and trans-methylation QTLs (meQTLs). Results Whole blood samples from 4,457 genotyped Holstein cows were epigenotyped. Across all CpG sites, the heritability estimates averaged 24.6%. The local meQTL mapping at sequence-level for variable CpG sites (SD > 2.5%; n = 28,806) detected cis-meQTLs for 80.1% of the CpG sites, with sentinel SNPs located close to their associated CpGs. A two-step analysis was also conducted to identify long-range associations, with a particular focus on trans-meQTL hotspots. First, we identified CpG-SNP trans-associations using medium-density genotypes (50k SNPs) that revealed 31,846 SNPs with significant effects on 1 to 530 trans-CpG sites. Then, regions associated with at least 34 independent trans-CpGs were retained defining 31 hotpots. For each hotspot, a local sequence-level GWAS was conducted using the first principal component derived from the associated trans-CpGs. Out of the 31 detected hotspots, three were located close to transcription factor genes (RUNX1, NFIC and FOXA3) for which the associated trans-CpGs were enriched for the corresponding binding motif. Two other hotspots were located within KDM5A and KDM5B, and their corresponding trans-CpGs were strongly overrepresented in H3K4me3 narrow peaks in blood as well as in other tissues. Conclusions By identifying functional candidate genes associated with blood DNA methylation in cattle, these findings provide new insights into the regulatory architecture of DNA methylation in mammals, highlighting the value of large-scale molecular data from livestock populations.

16

fourSynergy: Ensemble-based interaction calling on 4C-seq data using gradient-free optimization

Wind, S.-M.; Plagwitz, L.; Dix, J.; Heidtmann, G.; Heider, D.; Walter, C.

2026-06-01 bioinformatics 10.64898/2026.05.27.728108 medRxiv

Top 0.5%

0.4%

Show abstract

MotivationChromatin organization plays a crucial role in gene regulation and is associated with various severe diseases like cancer. Since chromatin changes are potentially reversible, a deeper understanding of the alterations needs to be harnessed for the development of new therapies. Circular Chromosome Conformation Capture Sequencing (4C-seq) is a sequencing technique enabling the identification of chromatin interactions between genes and regulatory elements. This work aims to develop an ensemble algorithm that utilizes synergies among available 4C-seq tools, which in turn allows to achieve superior predictive performance in interaction calling. ResultsWe employed existing 4C-seq algorithms using a weighted-voting approach. By optimizing the tool weights according to various predictive metrics using gradient-free optimization strategies, we demonstrate the potential of combining multiple 4C-seq analysis tools for interaction calling. Our results indicate that a weighted-voting based ensemble approach can outperform individual algorithms in various datasets. Although the optimal solutions differ across the 4C-seq datasets, we successfully identified global solutions that outperform the individual algorithms for all datasets analyzed. Availabilityhttps://github.com/sophiewind/fourSynergy, https://github.com/sophiewind/fourSynergy_pip Contactsophie.wind@uni-muenster.de Supplementary informationSupplementary data are available at Journal Name online.

17

Endosulfan rewires PKA and GSK3β to disrupt primary cilia-dependent Hedgehog signalling

Piyush, R.; Barmola, H.; Bhattacharjya, A.; Gupta, A.; Bhaumik, P.; Raghavan, S. C.; Choudhary, B.; Gadadhar, S.; Rao, S.; Shinde, S. R.

2026-07-09 cell biology 10.64898/2026.07.03.736336 medRxiv

Top 0.5%

0.4%

Show abstract

Primary cilium-dependent Hedgehog signalling is essential for embryonic development, tissue patterning, and organ homeostasis, and its disruption causes a spectrum of developmental disorders collectively termed ciliopathies. Whether environmental toxicants can chemically induce ciliopathy-like states by targeting this pathway, however, remains poorly understood. Here we show that endosulfan, a banned organochlorine pesticide epidemiologically linked to severe congenital and reproductive defects in exposed human populations, disrupts Hedgehog signalling by driving GLI transcription factor processing into repressor forms and suppressing target gene expression at both transcriptional and protein levels. Having excluded direct effects on core ciliary receptors and GLI-DNA binding, we identify the pathway kinases PKA and GSK3{beta} as direct targets of endosulfan: endosulfan increases PKA activity through allosteric fine-tuning, and -- in a pharmacologically rare finding -- acts as the first reported small-molecule activator of GSK3{beta}, shifting the kinase toward a catalytically active conformation. We further identify Cetn3 and Cep250 as novel GLI-regulated genes required for centriole cohesion, both of which are repressed upon endosulfan exposure, providing a mechanistic link to the reproductive defects reported in exposed populations and animal models. These findings identify endosulfan as a candidate chemical inducer of ciliopathy and reveal how an environmental toxicant can hijack core kinase signalling to disrupt Hedgehog-dependent development.

18

Transcriptional regulation of the type II fatty acid synthase complex-encoding gene cluster in Rhodococcus opacus

Leemans, P. G. C.; Van Eupen, A.; Bervoets, I.; Peeters, E.; Cornet, I.

2026-06-13 microbiology 10.1101/2025.09.25.678588 medRxiv

Top 0.5%

0.4%

Show abstract

Rhodococcus opacus is an oleaginous actinobacterium with considerable potential for lipid-based bioproduction, as well as for utilising a variety of carbon sources as substrates, including renewable, cost-effective resources. Although its capacity for triacylglycerol accumulation is well established, the regulatory logic that governs its fatty acid and mycolic acid biosynthesis is still poorly understood. Here, we investigated the transcriptional control of the type II fatty acid synthase (FASII) pathway in R. opacus PD630, revealing a regulatory architecture that is more complex than previously assumed. Differential gene expression analysis showed that environmental cues, including temperature, pH, carbon-to-nitrogen ratio and the presence of free fatty acids influence the FASII gene cluster expression in a non-uniform manner. This phenomenon suggests the presence of internal transcription start sites and modular regulation within the cluster. We identified three lipid-responsive transcription factors, MabRRO, FadR1RO and FadR2RO, that are all capable of binding the fasII promoter in vitro. DNA binding of FadR1RO and FadR2RO was disrupted by long-chain acyl-CoA molecules, indicating ligand-dependent control. Together, these findings reveal previously unrecognised layers of transcriptional regulation in the R. opacus FASII pathway and highlight both conserved and divergent regulatory features within the Mycobacteriales lineage. Featured Image O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=134 SRC="FIGDIR/small/678588v2_ufig1.gif" ALT="Figure 1"> View larger version (22K): org.highwire.dtl.DTLVardef@456a6dorg.highwire.dtl.DTLVardef@14e3d1forg.highwire.dtl.DTLVardef@18ed368org.highwire.dtl.DTLVardef@1d979a6_HPS_FORMAT_FIGEXP M_FIG C_FIG

19

Heavy metal exposure and conditional survival time in U.S. adults: a censored quantile regression cohort study

Fang, X.; Schwartz, J.

2026-07-09 epidemiology 10.64898/2026.06.29.26356268 medRxiv

Top 0.6%

0.3%

Show abstract

Abstract Background. Chronic low-level exposure to lead, cadmium, mercury, and arsenic remains a determinant of premature mortality in the U.S. general population, but previous hazard-ratio analyses do not characterize how exposure shifts the lower tail of the survival distribution, where premature mortality is concentrated. Objectives. We estimated the association of whole-blood lead, whole-blood total mercury, urinary cadmium, and the sum of urinary inorganic and methylated arsenic species with the 10th, 25th, and 50th conditional quantiles of follow-up time to all-cause mortality among U.S. adults aged 40 years and older. Methods. NHANES Continuous 1999 to 2018 was linked to the National Death Index through December 31, 2019 (n = 29,652). Censored quantile regression was fit per metal on the log2 scale at quantiles {tau}{0.10, 0.25, 0.50}. A restricted-cubic-spline (RCS) censored-quantile-regression was fit for blood lead and urinary cadmium to investigate the threshold effect. Results. Over a median follow-up of 9.1 years, 7,215 deaths were ascertained. A doubling of urinary cadmium was associated with -1.57 years of follow-up (95% CI: -2.08, -1.07) at the 10th conditional quantile, -1.50 (-2.04, -0.96) at the 25th, and -1.49 (-1.93, -1.04) at the median (Benjamini Hochberg q < 0.001 throughout). A doubling of whole-blood lead was associated with -0.70 years (95% CI: -0.99, -0.40) at the 10th conditional quantile, -0.62 (-0.92,-0.31) at the 25th, and -0.61 years (-0.89, -0.34) at the median; the absolute loss was largest at {tau} = 0.10 for both metals. Urinary arsenic-metabolite sum was not associated with conditional follow-up at the estimable quantiles. Despite adjustment for dark and fatty-fish intake or DHA/EPA, whole-blood total mercury was associated with longer follow-up (i.e., negatively associated with mortality risk), possibly due to residual confounding by broader dietary or socioeconomic factors, rather than a true protective effect. The cadmium association was additionally robust to the mutual adjustment of lead. Discussion. Low-to-moderate urinary cadmium and whole-blood lead were associated with fewer years of follow-up survival at the lower-tail and median conditional quantiles of survival, with the largest absolute losses at the lower tail of the conditional survival distribution, where premature mortality is concentrated. These findings support continued reductions in U.S. cadmium exposure and lead with particular benefit for adults most vulnerable to premature death.

20

Machine Learning-based Prediction of Preterm Birth Using Genetic Data

Sundelin, H.; Jacobsson, B.; Ytterberg, K.; Sole-Navais, P.; Juodakis, J.

2026-06-26 genetic and genomic medicine 10.64898/2026.06.24.26356330 medRxiv

Top 0.6%

0.3%

Show abstract

The leading cause of mortality and morbidity in children under the age of 5 is preterm birth. The timing of birth is influenced by both genetic and environmental factors, but the underlying mechanisms remain poorly understood, making its prediction difficult. In this study, we investigated the potential of using machine learning models to predict preterm birth based on genetic data from the Norwegian Mother, Father and Child Cohort Study (MoBa). We trained and evaluated several classification algorithms on individual-level genetic data from over 15,000 mothers and children. Our results indicate that the predictive capacity of maternal gestational duration-associated loci for preterm birth is limited, with the highest AUC values around 0.57. Additionally, incorporating more SNPs within the associated loci did not improve prediction performance. As expected, the contribution of the maternal genome to preterm birth prediction was found to be larger than that of the fetal genome. Overall, our findings suggest that while genetic testing provides some information about an individual's risk for preterm birth, further research incorporating additional factors is necessary to enhance predictability.